Utterance Clustering Using Stereo Audio Channels
نویسندگان
چکیده
Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve performance utterance by multichannel (stereo) signals. Processed signals were generated combining left- right-channel a few different ways then extracting embedded features (also called d-vectors) from those processed applied Gaussian mixture model for supervised clustering. In training phase, parameter-sharing was obtained train each speaker. testing speaker with maximum likelihood selected as detected Results experiments real recordings multiperson discussion sessions showed that proposed method used achieved significantly better than conventional mono-audio more complicated conditions.
منابع مشابه
Automatic Music Clustering using Audio Attributes
Abstract—Music brings people together, it allows us to experience the same emotions. Currently musical genre classification is done manually and requires even the trained human ear considerable effort. Therefore, clustering songs automatically and then drawing valuable insights from those clusters is an interesting problem and can add great value to music information retrieval systems. Most of ...
متن کاملParametric Coding of Stereo Audio
Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the...
متن کاملSpeaker indexing in audio archives using test utterance Gaussian mixture modeling
Speaker Indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. The major reason for the drawbacks of existing solutions is the use of inaccurate anchor models. The contribution of this paper is two-fold. On the theoretical side, a new method is developed for...
متن کاملHigh Quality Scalable Stereo Audio Coding
This paper proposes an efficient, low complexity, scalable audio coder based on a combination of two embedded coding algorithms: the SPIHT (set partitioning in hierarchical trees) coding algorithm [1] and an embedded, nested binary set partitioning (NBSP) algorithm. The SPIHT algorithm, considered to be the premier state-of-the-art algorithm in still image compression, is used for the low frequ...
متن کاملDistributed Virtual Conference Stereo Audio Reconstruction
To enhance the realistic experience of virtual conference, this paper proposed a distributed virtual conference stereo audio reconstruction model. The spatial audio parameters inter-aural level difference (ILD) is used to reconstruct the spatial sound field for each listener. The distributed synthesis system is designed to get a lower network payload.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Intelligence and Neuroscience
سال: 2021
ISSN: ['1687-5265', '1687-5273']
DOI: https://doi.org/10.1155/2021/6151651